Computer-Aided Grammatical Tagging Of Spoken English

نویسنده

  • Jan Svartvik
چکیده

The paper presents an outline of a system for grammatical tagging of the London-Lund Corpus of spoken English consisting of some 450 000 words. The material, all of which will be available on magnetic computer tape, and part of which is now available in both machinereadable and printed form, has been transcribed orthographically with prosodic marking for tone units, nuclei, stresses, pauses, etc (see Samples 1 and Z). Whereas there is now considerable agreement on the usefulness of a tagged corpus, there is as yet no consensus on the best type of tagging, let alone the procedure involved. The analysis proposed here is of course specifically aimed at tagging spoken English, but should be largely applicable also to written English. The syntactic tagging will initially be based on surface properties, since we are interested in gaining information that is directly available through the signals that hearers use for decoding a message, ie their perceptual strategies. In this respect, the plan is no innovation. One computer discourse model which is intended "to tackle problems that a speaker evidently tackles" has recently been reported by Davey (1978.4). His model, however, is designed to produce, not understand. Another and more important difference between the SSE system and the Davey model and most other computer discourse models is that the latter have been devised to handle restricted and artificial universes of discourse, such as describing games or moving blocks. However, the work of Winograd (197Z), for example, is directly relevant to our task, since it deals with wider aspects of language and makes impressive use of Halliday's systemic grammar for producing parsing algorithms. One of our aims is to make the tagging procedure as automatic as possible. Specifically, we would like to see how far it is possible to carry out syntactic analysis based on graphic words and prosody (provided by the material) and word class tags (provided by a generalpurpose dictionary). Given that no fully automatic system for grammatical tagging exists, we propose to implement an interactive, semi-manual mode of analysis. The paper will present word class tagging of types from the Longman Dictionary of Contemporary English, disambiguation of tokens and phrase tagging by means of a set of parsing algorithms. The basic unit of analysis will be the tone unit. In a previous study of Survey material of spoken English, it was found that the overall average length of a tone unit was 5.3 words and that "there was considerable correlation between the length of tone units and their grammatical contents" with a "high degree of co-extensiveness between tone units and grammatical units of group, phrase, and clause structure" (Quirk et al 1964). The search for grammatical phrases will be from right to left within the tone unit. Since this search sequence is definitely unorthodox, some explanation may be called for. By and large, English phrase structure typically has the head to the right, as in Verb phrases: will be DOING Noun phrases: the nice little DOG Adjective phrases: stunningly BEAUTIFUL Assuming that a good number of the tone units consist of, at least, grammatical phrases, the nucleus will occur within the phrase and, more often than not, within the head of the phrase. Thus, it is likely that it will be linguistically rewarding as well as computationally economical to search from right to left. It seems that a left-to-right search method also runs into difficulties with solving left-recursion structures and predicting numerous alternatives. The phrase recognition rules are to be applied in the following order: (VPH) Verb phrases (APH) Adverb phrases (JPH) Adjective phrases (NPH) Noun phrases (PPH) Prepositional phrases

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

The Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings

The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...

متن کامل

THE IMPACT OF USING COMPUTER-AIDED ARGUMENT MAPPING (CAAM) ON THE IMPROVEMENT OF IRANIAN EFL LEARNERS’ WRITING SELF-REGULATION

The present study was conducted to investigate the impact of using computer-aided argument mapping (CAAM) on the improvement of Iranian learners’ writing self-regulation. To this end, 90 participants out of 127 senior university students in English translation were selected after administrating language proficiency test, as well as an essay writing test for the purpose of homogenizing the learn...

متن کامل

Claws4: The Tagging Of The British National Corpus

The main purpose of this paper is to describe the CLAWS4 general-purpose grammatical tagger, used for the tagging of the 100-million-word British National Corpus, of which c.70 million words have been tagged at the time of writing (April 1994)) We will emphasise the goals of (a) gener~d-purpose adaptability, (b) incorporation of linguistic knowledge to improve quality ,and consistency, and (c) ...

متن کامل

Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English

This paper investigates the suitability of state-of-the-art natural language processing (NLP) tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment (ALA) and computer-assisted language learning (CALL). Due to the non-canonical nature of spoken language (containing fil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1980